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(54) Method and apparatus for providing failure detection and recovery with predetermined 
degree of replication for distributed applications in a network 



(57) An application module (A) running on a host 
computer in a computer network is failure-protected with 
one or more backup copies that are operative on other 
host computers in the network. In order to effect fault 
protection, the application module registers itself with a 
Replica Manager daemon process (112) by sending a 
registration message, which message, in addition to 
identifying the registering application module and the 
host computer on which it is running, includes the par- 
ticular replication strategy (cold backup, warm backup, 
or hot backup) and the degree of replication associated 
with that application module. The backup copies are 
then maintained in a fail-over state according to the reg- 
istered replication strategy. A Watch Dog daemon (113), 
running on the same host computer as the registered 
application periodically monitors the registered applica- 
tion to detect failures. When a failure, such as a crash 
or hangup of the application module, is detected, the fail- 
ure is reported to the ReplicaManager, which effects the 
requested fait-over actions. An additional backup copy 
is then made operative in accordance with the regis- 
tered replication style and the registered degree of rep- 
lication. A SuperWatchDog daemon process (115-1), 



running on the same host computer as the ReplicaMan- 
ager, monitors each host computer in the computer net- 
work. When a host failure is detected, each application 
module running on that host computer is individually fail- 
ure-protected in accordance with its registered replica- 
tion style and degree of replication. 
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